Journal: bioRxiv
Article Title: OmniCell: Unified Foundation Modeling of Single-Cell and Spatial Transcriptomics for Cellular and Molecular Insights
doi: 10.64898/2025.12.29.696804
Figure Lengend Snippet: An overview of the OmniCell model, its pretraining data, architecture, downstream tasks, and ablation study. (A) Composition of pretraining data. OmniCell was pretrained on 67 million cells comprising single-cell RNA sequencing (scRNA-seq; 57.8%) and Stereo-seq spatial transcriptomics (ST; 42.2%) data across diverse tissues and cell types. (B) Model architecture. OmniCell integrates scRNA-seq and ST data within a unified framework. For scRNA-seq, cells are encoded as ordered gene sequences based on expression. For ST, spatial context is incorporated through neighborhood graphs capturing local cellular relationships. Gene expression values are normalized via soft-rank transformation. A mixture-of-experts (MoE) gene-aware value embedding module adaptively encodes expression levels. The architecture comprises 10 Transformer layers, with a symmetric bilinear output module that jointly models cell–gene relationships to generate unified embeddings. (C) Downstream tasks. Model performance was assessed on ST-specific tasks (spatial clustering, domain identification, deconvolution, and gene module analysis) and scRNA-seq tasks (cell clustering, cell-type annotation, gene module analysis, batch correction, and marker gene identification). (D) Ablation analysis. Systematic removal of model components—including the MoE value embedder, spectral subspace projection, symmetric bilinear module, and ST pretraining data—quantifies their individual contributions to model performance.
Article Snippet: Spatially resolved transcriptomic data were generated primarily using the STOmics Stereo-seq whole-transcriptome platform.
Techniques: RNA Sequencing, Expressing, Gene Expression, Transformation Assay, Marker